Skip to content

Impute household bus fare spending from LCFS#428

Merged
vahid-ahmadi merged 10 commits into
mainfrom
add-bus-fare-spending-imputation
Jun 17, 2026
Merged

Impute household bus fare spending from LCFS#428
vahid-ahmadi merged 10 commits into
mainfrom
add-bus-fare-spending-imputation

Conversation

@vahid-ahmadi

@vahid-ahmadi vahid-ahmadi commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Adds bus_fare_spending as a new output of the LCFS-trained consumption QRF, summed from the detailed COICOP 7.3.2 bus & coach fare codes (c73212, c73213, c73214) in the LCFS dvhh file. It is annualised with the standard WEEKS_IN_YEAR constant and CPI-uprated like the other consumption categories. Also adds bus calibration smoke-test targets.

Why

policyengine-uk can model rail fare policy but not bus, because the only bus variable (bus_subsidy_spending) is the government subsidy benefit-in-kind from ETB — not the fare passengers pay. A fare reform (flat/£2 fare, free travel for young people) changes passenger fares, so we need household bus fare expenditure. This is Step 1 toward that. See #427.

Approach

Follows the existing petrol_spending / diesel_spending pattern exactly:

  • New constant BUS_FARE_LCFS_CODES = ["c73212", "c73213", "c73214"] (COICOP 7.3.2; excludes rail 7.3.1, air, combined tickets, taxis).
  • generate_lcfs_table derives bus_fare_spending (sum of those codes), added to the annualise list.
  • Added to IMPUTATIONS, which feeds the model metadata (impute_variables), so the cached consumption model auto-invalidates and retrains — no manual version bump.

Calibration smoke-test targets

test_aggregates.py now includes:

  • bus_fare_spending: GBP 3.4bn — DfT Annual Bus Statistics (year ending March 2025), passenger fare receipts for local bus services in England (~52% of operating revenue). The LCFS input is UK household bus/coach fare spending, so this is an order-of-magnitude smoke target until a direct UK/GB household target is available.
  • bus_subsidy_spending: GBP 2.5bn — re-enabled as an approximate public-support smoke-test target.

CI runs make downloadmake test against the prebuilt dataset, which won't contain bus_fare_spending until a dataset built with this imputation is published. To avoid a red build, test_aggregates skips any variable not present in the loaded dataset (baseline.input_variables). This is self-activating: once a rebuilt dataset is published, the column appears and the target checks for real — no manual follow-up.

Verified locally against the default dataset: bus_subsidy_spending 2.21bn (11.6% rel err, passes), rail_subsidy_spending 22.86bn (5.8%, passes), bus_fare_spending skips (not yet in dataset).

Sanity check

LCFS 2023/24 implied UK bus/coach fare spending is approximately GBP 2.66bn/yr pre-uprating; this uprates toward the GBP 3.4bn order-of-magnitude smoke target at FY26/27 prices. Nonzero in a minority of households — sparse, as expected for a short-diary survey, and the same sparsity the existing consumption QRF already handles.

Companion change required (separate PR, see #427)

To be consumed, policyengine-uk needs a matching bus_fare_spending input Variable — added in PolicyEngine/policyengine-uk#1780. Until then the column is harmlessly skipped on load (simulation.py ignores unknown columns), so this PR is safe to merge on its own.

Not in scope

Person-level allocation by age (needs NTS) and bus fare reform parameters/variables — tracked in #427.

🤖 Generated with Claude Code

vahid-ahmadi and others added 2 commits June 16, 2026 14:21
Add bus_fare_spending as a new output of the consumption QRF, summed from
the detailed LCFS bus & coach fare codes (c73212/c73213/c73214), annualised
and CPI-uprated like other consumption categories. This gives the passenger
fare households pay, distinct from bus_subsidy_spending (the ETB
government-subsidy benefit-in-kind), as a building block for modelling bus
fare reforms.

Recorded household-level only; person-level allocation by age (for e.g. a
young-person fare scheme) needs an external NTS usage profile.

Refs #427.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vahid-ahmadi and others added 5 commits June 16, 2026 14:37
Add a bus_fare_spending aggregate target (GBP 3.4bn passenger fare receipts,
DfT Annual Bus Statistics year ending March 2025) and re-enable the
bus_subsidy_spending target (GBP 2.5bn). Guard test_aggregates to skip any
variable not present in the loaded dataset, so bus_fare_spending self-activates
once a dataset built with the new imputation is published rather than failing
on a default-zero aggregate against the currently-downloaded dataset.

Verified locally against the default dataset: bus_subsidy_spending 2.21bn
(11.6% rel err, passes), bus_fare_spending skips (not yet in dataset).

Refs #427.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The LCFS ingestion test builds a minimal header without the bus & coach
sub-codes, which raised KeyError. Sum whichever of the granular COICOP 7.3.2
sub-codes are present (they are sparse and the exact set can vary across LCFS
vintages); a wholesale disappearance is caught by the bus_fare_spending
aggregate smoke test. Add the codes to the ingestion fixture and assert the
annualised bus fare.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_download_workbook did requests.get + raise_for_status with no retry, so a
single OBR 429 (rate limit) dropped the OBR target set and failed
test_target_registry::test_obr_income_tax_value with StopIteration. Add
bounded exponential-backoff retry on 429/5xx and connection errors, honouring
a numeric Retry-After header; lru_cache still downloads each workbook at most
once per run on success.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sum the bus & coach COICOP codes explicitly (fail loud if a column is missing,
matching the petrol/diesel pattern) rather than tolerating absent sub-codes.
Drop the test_aggregates skip guard and instead record bus_fare_spending as a
commented-out target (repo convention) to enable once a dataset with the
imputation is published; bus_subsidy_spending stays active.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The codes were verified against LCFS 2021/22 but the current release is
2023/24; reword the comment to state codes are confirmed for 2021/22 and must
be re-confirmed when bumping CURRENT_LCFS_RELEASE, and that they resolve
directly at build time (a renamed/removed code fails loudly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

Review notes from cross-checking this with PolicyEngine/policyengine-uk#1780 and the existing rail fare pattern:

  1. The imputation direction looks reasonable for a first household passenger-fare input: summing the detailed LCFS COICOP 7.3.2 bus/coach fare codes, annualising with WEEKS_IN_YEAR, adding it to IMPUTATIONS, and keeping it distinct from bus_subsidy_spending matches the intended use.

  2. The DfT £3.4bn target needs a coverage caveat. The Annual Bus Statistics page is GB as a collection, but the £3.4bn passenger fare receipts figure in the publication is for local bus services in England. It also excludes long-distance coach services, while this variable is described as bus and coach fares. That makes it useful as an order-of-magnitude smoke test, but not a clean UK/GB target for the LCFS household variable without adjustment.

  3. The PR body says test_aggregates now skips variables absent from the loaded dataset and will self-activate once the rebuilt dataset has bus_fare_spending. The actual diff does not implement that skip; bus_fare_spending is still commented out, so nothing will self-activate. Either implement the described skip-and-check behavior or update the PR text to say the target is recorded but disabled.

  4. For future young-person bus fare policies: this gives a household baseline spend, but unlike rail in policyengine-uk there is not yet a quantity/price split or person-age allocation. Rail has rail_usage × fare_index; bus will need an additional allocation/profile layer, probably from NTS age bus-use shares, before modelling free fares for young people defensibly.

vahid-ahmadi and others added 2 commits June 17, 2026 11:01
Verified c73212/c73213/c73214 exist in dvhh_ukanon_v2_2023.tab; implied UK
bus/coach fare spend ~GBP 2.66bn (2023/24, pre-uprating), consistent with the
GBP 3.4bn smoke target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

Follow-up: I pushed 8b11d35 to address the aggregate-test/body mismatch and target coverage wording.

The bus_fare_spending smoke target is now enabled in AGGREGATES but skipped while absent from the loaded dataset, and the comments/PR body now describe the GBP 3.4bn DfT figure as England local bus passenger receipts used as an order-of-magnitude target for the broader UK LCFS household bus/coach fare input.

Re-comment the bus_fare_spending smoke target and drop the
skip-when-absent guard reintroduced in 8b11d35, per the no-fallbacks
decision. bus_subsidy_spending stays active; bus_fare_spending is enabled
manually once a dataset with the imputation is published. Keeps the
improved England-receipts caveat in the comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant